Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deprecate histogram functionality #6842

Closed
wants to merge 1 commit into from

Conversation

simonbyrne
Copy link
Contributor

The new Histogram type is now in StatsBase.jl. This deprecates the histogram functions (hist,hist!,histrange,midpoints) from Base. See discussion in #6601.

@JeffBezanson
Copy link
Member

This might be the way forward but I don't think we're going to do this in 0.3.

@ViralBShah
Copy link
Member

Bump. Now is the time to do this.

@lindahua
Copy link
Contributor

lindahua commented Nov 6, 2014

+1

@johnmyleswhite
Copy link
Member

Great idea.

@pao pao mentioned this pull request Nov 9, 2014
@timholy
Copy link
Member

timholy commented Nov 9, 2014

I support this too, but one concern is that the implementation in StatsBase looks like it will probably be painfully slow. Has anyone benchmarked it vs the one in Base? (esp. with respect to #8952)

@lindahua
Copy link
Contributor

@timholy I think the implementation in StatsBase has yet to be optimized. What about moving your PR there?

@timholy
Copy link
Member

timholy commented Nov 10, 2014

I'll give that a whirl (after a couple of other priorities).

@ViralBShah
Copy link
Member

I have thought about this, and generally have mixed feelings about not being able to do histograms in base julia. It is common enough that one may not want to install StatsBase and use it from a module.

I am ok with renaming, better APIs, but perhaps we should move Histogram from StatsBase to Base than the other way around.

@ViralBShah ViralBShah added the needs decision A decision on this change is needed label Dec 18, 2014
@StefanKarpinski
Copy link
Member

Is it just me or does anyone else find the Matlab-style hist function really hard to use? If I do hist(v) where v is a vector of integers, it seems pretty obvious that I would want the values to be the bins. If we're going to have this functionality in base, I would want it to be better, more generic and faster.

@stevengj
Copy link
Member

Does one use hist much other than for plotting?

@ivarne
Copy link
Member

ivarne commented Dec 18, 2014

I tried to use hist to count the number of occurrences of each unique value in a vector, but I did not succeed. Is there a different function for doing that, or can it be included in hist?

@johnmyleswhite
Copy link
Member

We call that countmap in StatsBase.

Coming from a statistics background, I'm not really excited about mixing countmap and hist together. Histograms are designed for density estimation. Counting unique items may be related to density estimation (especially for finite sets), but I think of them as conceptually distinct tools.

@ViralBShah
Copy link
Member

Matlab seems to have deprecated hist and histc with histogram and histcounts. Not that it matters here in any way, but histogram draws, whereas you can get edges and such with histcounts. I don't think one needs a separate command for plotting histograms.

@andreasnoack
Copy link
Member

On this issue, I agree with @johnmyleswhite and consider a histogram a density estimate and what @StefanKarpinski asks for a frequency count or countmap (I think the latter word is more common in machine learning).

I also think of a histogram as the graphical representation and have been surprised by the hist in base. Therefore, I wonder if there is a separate interpretation of histograms outside classical statistics. Yesterday we got this question on the stats list

https://groups.google.com/forum/#!topic/julia-stats/CO3Pgc89Y7A

which was a use I was not aware of. Is it also used like this in CS?

@lindahua
Copy link
Contributor

Counting the occurrences of discrete items and summarizing the distribution of continuous values using bins (i.e. constructing histograms) are different concepts and usually used in different contexts. Obviously, we should use different functions for them.

@ViralBShah ViralBShah added this to the 0.4 milestone Feb 1, 2015
@ViralBShah
Copy link
Member

We should remove this from Base in 0.4 in favour of the functionality in StatsBase.

@jakebolewski
Copy link
Member

Bump. A couple of the tests rely on hist functionality, should they be rewritten?

@ViralBShah
Copy link
Member

We can comment them out for now, and rewrite them, or move those tests to StatsBase.

@IainNZ
Copy link
Member

IainNZ commented Feb 11, 2015

+1 for removing, and also the older comments that it invites confusion (I tried to use it in the count-uniques sense earlier today...). StatsBase is great and very widely used - most users will have it installed anyway.

@BobPortmann
Copy link
Contributor

I'm fine with moving histogram functionality to StatsBase, but do have some issues with the implementation in StatsBase.

  1. There does not seem to be an function to make a N dimensional histogram from a Array{T,N}. Instead one must input a NTuple{N,AbstractVector}. This seems strange to me, as any time I would want to compute a N-dimentional histogram the data would be in an Array. Also, the edges need to be input as a NTuple{N,AbstractVector} instead of a Array{T,N}, which seems awkward. Maybe I just didn't look in the right place?
  2. Why is the function called "fit"? A histogram is computed directly from the input data and the edges. There is no fitting involved. I would much prefer to call the function hist or histogram as is more common and much more discoverable.

@johnmyleswhite
Copy link
Member

Having a function called hist is fine. But a histogram is both a descriptive statistic and an inferential statistic, so it's certainly a type of model that one fits to the data.

@simonbyrne
Copy link
Contributor Author

I've updated the PR.

@BobPortmann Those are good points: would you mind opening an issue on StatsBase?

@ViralBShah
Copy link
Member

Bump.

@quinnj
Copy link
Member

quinnj commented May 9, 2016

This was a 0.4.x milestone PR; still relevant? Should we remove hist & friends before 0.5 release?

@ViralBShah ViralBShah modified the milestones: 0.5.0, 0.4.x May 9, 2016
@ViralBShah
Copy link
Member

Marking 0.5.0 for the triage team.

@simonbyrne
Copy link
Contributor Author

Closed in favour of #16450

@simonbyrne simonbyrne closed this May 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs decision A decision on this change is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.